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Abstract 

The recent explosion of genome sequences from all major phylogenetic groups has unveiled an unexpected wealth of cases 
of recurrent evolution of strikingly similar genomic features in different lineages. Here, we review the diverse known types of 
recurrent evolution in eukaryotic genomes, with a special focus on metazoans, ranging from reductive genome evolution to 
origins of splice-leader trans-splicing, from tandem exon duplications to gene family expansions. We first propose a general 
classification scheme for evolutionary recurrence at the genomic level, based on the type of driving force — mutation or 
selection — and the environmental and genomic circumstances underlying these forces. We then discuss various cases of 
recurrent genomic evolution under this scheme. Finally, we provide a broader context for repeated genomic evolution, 
including the unique relationship of genomic recurrence with the genotype-phenotype map, and the ways in which the 
study of recurrent genomic evolution can be used to understand fundamental evolutionary processes. 
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Evolutionary Biology in the Era of 
Ubiquitous Genomes 

The explosion of genomic sequences over the past few years 
has revolutionized our understanding of evolution. Ten years 
after publication of the human genome sequence (Lander 
et al. 2001; Venter et al. 2001), hundreds of genomes 
are now available, spanning nearly all major phylogenetic 
groups, and providing an increasingly focused picture of 
evolutionary processes. These resources have allowed iden- 
tification of troves of both broadly shared genomic features 
(allowing the reconstruction of presumed ancestral traits, 
e.g., the gene complements of the eukaryotic and meta- 
zoan ancestors; Putnam et al. 2007; Fritz-Laylin et al. 
2010) and lineage-specific genomic changes (in some cases 
allowing associations with phenotypic novelties, e.g., Wang 
et al. 2005; Zhang et al. 2010; McLean et al. 201 1). In ad- 
dition, many instances of a third more puzzling phylogenetic 
pattern have been observed: traits whose distribution is 
"scattered" across the evolutionary tree (fig. 1), indicating 
repeated independent evolution of similar genomic features 
in different lineages. 



Recurrent Evolution: Phenotypic, 
Molecular, and Genomic 

Recurrent evolution has been extensively studied at a variety 
of levels and has often led to confusion due to a lack of ex- 
plicit definitions (Doolittle 1 994; Arendt and Reznick 2008). 
It is therefore useful to begin our discussion by comparing 
recurrent genomic evolution as defined and reviewed here 
with previous definitions and work. 

Recurrent Phenotypic Evolution 

Recurrent evolution has most commonly been studied at 
the level of organismal phenotype (fig. 2), comprising an 
extremely rich field with hundreds of articles spanning three 
centuries exploring a wide diversity of recurrent phenotypes 
and lineages (Scotland 201 1). A central concern of pheno- 
typic work has been understanding the physical or genetic 
causes for recurrence. This pursuit often focuses on distin- 
guishing between convergent evolution and parallel evolu- 
tion (a distinction which itself has been extensively debated; 
Arendt and Reznick 2008; Scotland 2011). Generally, the 
distinctions follow etymology: parallel comes from the 
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Fig. 1. — Phylogenetic distribution of some genomic features across metazoans. Genome-wide/gene-wide traits are mapped to a phylogenetic tree 
of metazoans (plus choanoflagellates) depicted by empty/solid forms above/below the tree branches, as indicated in the legend. Red shapes denote 
recurrent loss of ancestral features, whereas green features involve overall gain of genomic sequence; blue represents more complex characters. Each 
symbol indicates that a particular feature has evolved independently at least once within the corresponding taxonomic group. For example, "reductive 
evolution" in the teleost branch indicates that at least one lineage within the group (pufferfish) is known to show this feature. In the case of WGD, 
several symbols along the same branch represent the existence of lineages with successive rounds of WGD (i.e., octoploidy, dodecaploidy, etc.). 
Numbers in parentheses indicate which tropomyosin (TPM) exon(s) have duplicated in tandem in each event. The cases represented here are selected 
examples from the literature and are not intended as an exhaustive list; in addition, many yet unknown cases are expected to be discovered with the 
increasing availability of whole-genome sequences. 



Greek for "beside" + "each other" (Ilapd + aXXr\Xoq) and 
thus involves lineages with initially similar starting points 
arriving at similar endpoints by taking similar paths; on 
the other hand, convergence comes from the Latin for 
"with or together" (com-) and "to incline, tend toward" 
(vergere) and thus generally involves lineages with differ- 
ent starting points taking different paths to arrive at similar 
endpoints. For instance, one proposed distinction between 
parallelism and convergence focuses on the starting points 
for the two lineages: whether similar (closely related species, 
parallel) or different (distantly related species, convergent). 
Another proposed distinction focuses on paths (the specific 
genetic mutations underlying the changes) taken by the 



two lineages — whether the same (parallel) or different 
(convergent) (Arendt and Reznick 2008). Importantly, 
the two proposed distinctions are related since, because 
of their higher genetic and developmental similarities, 
closely related species are more likely to evolve similar traits 
by identical genetic changes than are species with more 
disparate biology (although this is not always the case; 
Arendt and Reznick 2008). 

Recurrent Molecular Evolution 

An equally diverse range of phenomena is subsumed under 
the heading of "recurrent molecular evolution." A useful 
starting point here is Doolittle's (1994) four-category 
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Fig. 2. — Levels of recurrent evolution. Different levels of biological 
organization in which recurrent evolution may be studied. Although the 
phenotype should be considered a continuum across the different scales 
of biological complexity, for practical reasons, we may divide it into three 
levels: 1) organismal: individual features such as anatomy, physiology, 
behavior, etc.; 2) cellular: characteristics of single cells, including cell 
movements, secretory capacities, morphology, organellar composition, 
etc. (equivalent to the organismal level in unicellular species); and 3) 
molecular: all observed traits below the cellular level, including tran- 
scriptome, proteome, biochemical properties, chromatin structure, etc. 
Genomic level (gray box) corresponds only to the nucleotide sequence 
(i.e., elements that can be recognized at the sequence level) and may be 
comparable to the classic concept of genotype. 

schema. He identified 1) functional convergence, in which 
the same molecular function arises multiple times (e.g., 
unrelated enzymes catalyzing the same reaction; Galperin 
and Koonin 2012); 2) mechanistic convergence, involving 
evolution of similar mechanisms for accomplishing similar 
functions in unrelated molecules (e.g., similar sidechain 
geometries in unrelated serine proteases; Kraut 1977); 
3) structural convergence, in which unrelated sequences 
fold into similar structures (e.g., repeated evolution of 
alpha-helices and beta-sheets or similar RNA secondary 
structures); and 4) sequence convergence, in which similar 
specific molecular (either DNA or protein) sequences evolve 
multiple times independently. 

Recurrent Genomic Evolution 

We are now in a position to define recurrent genomic 
evolution, the topic of this review, and to see how it differs 
from nearly all of these other levels of recurrence. Whereas 
organismal (i.e., anatomical, physiological, etc.) and most 
categories of molecular recurrence are observed at any of 
the phenotypic levels (fig. 2), genomic recurrence is directly 
observed as similar changes in the genotype — that is, at 
the level of DNA sequence. Notably, then, even most of 
Doolittle's molecular categories (functional, mechanistic, 



and structural) do not qualify as genomic recurrence be- 
cause they relate to phenotype. Although these categories 
are defined at the molecular level (and thus intuitively 
"closer" to the genomic or genotypic level), they are in 
fact phenotypic. This becomes clear when the general 
definition of phenotype — the observable characteristics 
of an organism — is recalled. We may recognize different 
"levels" across the phenotypic continuum — molecular, 
cellular, and organismal (fig. 2) — but this does not change 
the fact that they are all clearly aspects of phenotype and 
not genotype: they reflect directly observable characteristics 
of the organism or cell. 

Another fundamental distinction between "classical" and 
genomic recurrence involves the focus of the study: in clas- 
sical studies of recurrence in molecules, cells, or organisms, 
repeated evolution is initially observed at the phenotypic 
level and only thereafter interrogated at the genotype/geno- 
mic level. By contrast, genotypic convergence involves direct 
observation of similar or same changes in the genome in dif- 
ferent lineages, notwithstanding these changes' effects on 
the various levels of phenotype (whether similar, different, 
or even potentially nonexistent). Genotypic (—genomic) re- 
currence is thus most closely related to Doolittle's fourth cat- 
egory, sequence convergence. 

The Importance of Being Recurrent 
and the "Rules" of Evolution 

The study of recurrent evolution is of special importance for 
understanding the forces shaping genomes. Because of the 
inherent stochasticity of evolutionary processes, inferring 
evolutionary forces from the occurrence of a given (set of) 
change(s) in a single lineage is difficult. Recurrent evolution 
of the same genomic characteristics suggests predictability of 
evolution, elucidating the rules of genome evolution by re- 
vealing commonalities of evolutionary forces experienced 
across disparate lineages (Conway Morris 2009). We believe 
that the wealth of recurrent genomic features indicate unap- 
preciated similarity of fundamental forces across lineages. 
Although the large number of genomic characters and finite 
nature of sequence space implies that genomic recurrence 
may sometimes occur simply by chance (see below), many 
cases have now been unearthed that suggest specific forces 
driving genome evolution down similar paths in different 
lineages. Identifying and understanding these forces or causes 
are perhaps the major challenge of the study of recurrent 
genome evolution. 

Chance, Heterogeneity of Causes, and 
Genomic Recurrence 

Inherent to the treatment of recurrence as a valuable and 
biologically meaningful tool to understand evolution is 
the notion that cases of repeated genomic evolution are in- 
formative if they occur in excess of the level of coincidence 
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expected simply from the action of stochastic processes in 
finite sequence space. In some cases discussed here, this null 
hypothesis can be rejected. Other cases await direct testing, 
generally because of the lack of enough data to assess the 
statistical significance of the pattern and/or to properly de- 
fine the null hypothesis (i.e., specific mutation rates across 
lineages, etc.). Although we have chosen to discuss mostly 
cases that we believe are likely to reflect unexpected levels 
of recurrence (with some exceptions such as whole-genome 
duplications [WGDs], see below), it remains possible that 
some of these examples do not significantly differ from 
the chance expectation. Similarly, it is worth pointing out 
that different instances of a particular trait may be due to 
different pressures acting in different lineages (this is partic- 
ularly possible for cases in which fundamentally different 
mechanisms for a given genomic change are imaginable). 
Although recurrent patterns caused by different pressures 
should be considered true recurrence, their subsequent 
evolutionary interpretation will be much more obscure. 
These considerations place similar caveats on most or all 
cases discussed below, and thus, they will not be discussed 
extensively for each instance, but just in a few particularly 
enlightening examples. Ultimately, random chance and 
our proposed explanations represent testable alternative 
hypotheses that could and should be directly tested. 

The Causes of Recurrent Evolution of 
Genomic Features 

What forces may explain genomic recurrence? In contrast to 
recurrent anatomical or physiological characters, which are 
usually (and reasonably) assumed to reflect adaption, often 
due to shared peculiarities of the organisms' environmental 
niches, the potential causes of observed recurrent genomic 
features are more diverse and may be very different for 
different recurrent traits — indeed, in some cases, the adap- 
tative value of repeated genomic outcomes is dubious. 
In understanding the forces driving recurrent genomic 
evolution, we believe that the following two axes are 
particularly important. 

Forces Driving the Pressure: Mutation, Positive Selection, 
or Relaxed Selection 

A species undergoes a genomic change when 1) a sponta- 
neous mutation occurs and 2) the resultant mutated allele 
spreads through the population, a process highly dependent 
on selective strength and efficiency (incorporating demog- 
raphy, effective population size, etc.). Thus, insofar as re- 
current changes reflect similar pressures or constraints 
across lineages, these similarities may involve forces that 
are "mutational" or "selective" (or even both). The notion 
that selection could impart a directionality to evolutionary 
change is familiar to any evolutionary biologist; however, 
that mutation could be directional may be less familiar 



(the interested reader should consult Yampolsky and 
Stoltzfus 2001). Mutation can be no less a directional force 
if a certain class of mutation (G-to-A, small genomic dele- 
tions, intron loss, etc.) is more frequent than its reverse (A- 
to-G, small insertions, intron gain, etc.). Thus, all that is 
needed for mutation-driven recurrent evolution is that mul- 
tiple lineages are experiencing similar mutational biases in 
parallel. 

For selective pressure, a second question is whether the 
recurrence is due to similar "positive" selective pressure in 
multiple lineages or to similar "relaxation" of selective pres- 
sure in multiple lineages. Notably, differences in selective 
pressure include not only classical fitness variation but also 
in effective population size (A/ e ) that leads to differences in 
the effectiveness of selection versus drift. Indeed, according 
to one influential model, a general prediction of this is that 
several general aspects of the genome architecture should 
evolve recurrently in lineages exposed long enough to 
similar A/ e (and mutation rates) (Lynch and Conery 2003; 
Lynch 2006, 2007). 

Nature of the Pressure: General, Recurrent Environmental, 
or Recurrent Genetic 

Another important consideration involves the distribution of 
the pressure driving convergence and the source of that 
pressure. Similar evolutionary pressures and constraints 
in two lineages can either be 1) "general" (or ancestral), 
that is, applying to most or all lineages within a group or 
2) "recurrent," that is, pressures that themselves arose inde- 
pendently in only a subset of lineages. For recurrent pressure, 
a second question is whether the pressure arose due to 
a previous change in the genome of the species ("genetic" 
or intrinsic) or in its environment ("environmental" or extrinsic). 

Using this framework, we next review some of the major 
known cases (or classes) of recurrent genomic evolution 
(summarized in table 1), beginning with the illustrative case 
of reductive genome evolution (RGE). Notably, for many of 
the phenomena discussed here, the causes remain unclear 
and often debated. Our goal is to frame the questions and 
to engender debate, not to arbitrate between competing 
hypotheses. In addition, we have chosen to focus on eukary- 
otic nuclear genomes, and thus, we will not discuss an equal 
number of interesting cases of recurrent evolution in prokar- 
yotes and eukaryotic organelles. 

An Example: On the Causes of Reductive Genome 
Evolution 

These distinctions are illustrated by different hypotheses 
about the evolutionary causes of RGE. RGE is perhaps the 
best-known instance of recurrent genome evolution. RGE 
has been observed in nearly all eukaryotic superkingdoms 
(Venkatesh et al. 2000; Lane et al. 2007; Morrison et al. 
2007; Opperman et al. 2008; Slamovits and Keeling 
2009; Ankarklev et al. 2010; Corradi et al. 2010) and can 
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Table 1 

Possible Causes of Recurrent Genomic Evolution 



Driving force 



Nature of the pressure 



Selectional 



Recurrent 



Mutational 



Positive Relaxed General/Ancestral Environmental Genetic 



Probability of 
occurrence 
by chance 



Genomic organization 
Reductive evolution 
Genome expansion 
WGDs 

Sex chromosomes 
Nucleotide composition 
Genome-wide gene structures 
Massive intron loss 
Strong intron boundaries 
SLTS 

Complete loss of ancestral 

U12 introns 
Gene/gene family level 
Gene family expansions 
Cluster formation and assembly 

of syntenic blocks 
Disruption of gene clusters and 

other syntenic blocks 
Gene losses 
Specific intragenic features 
Tandem exon duplications 
Gene structures 
Loss of gene segments 



X 
X 



Null 
Low 
High 
Low 
Low 

Low 
Null 
Low 
Low 



High 
Low 

High 

High 

Low 
High 
Low 



include pronounced gene loss, elimination of repetitive 
elements, evolution of overlapping genes, reduction of 
average intron sizes, and/or intron numbers and other 
genomic changes leading to more compact genomes. In 
addition, significant genome contractions have occurred even 
in typically large genomes: For instance, multiple mammalian 
orders have experienced parallel patterns of genome contrac- 
tion (including loss of nuclear mitochondrial sequences 
[NumtS], pseudogenes, and long terminal repeat retrotrans- 
posons) following the Cretaceous-Tertiary (KT) boundary 
(Rho et al. 2009). 

Several hypotheses have been proposed for genome 
reduction. First, RGE is often argued to reflect positive selec- 
tion for loss of inessential genomic elements acting specif- 
ically on parasitic/fast-replicating lineages. This hypothesis 
is an example of a recurrent (acting only or especially on 
some lineages) environmental (due to considerations of 
an organism's niche) "positive-selective" pressure. Another 
alternative is that RGE reflects loss of genomic sequences 
that are no longer efficiently maintained by selection 
("relaxed-selective" pressure). Several possible reasons for 
relaxed-selective pressure are possible. Changes in lifestyle 
could render some processes obsolete (e.g., parasites that 
obtain products from their hosts may lose biosynthetic 
pathways), an example of "recurrent-environmental" causes. 
Reduced efficiency of selection due to reduced effective 
population size in parasites could also lead to weakly selected 



elements (also recurrent-environmental) (Lynch 2007). In 
some cases, loss of one gene may render related/interacting 
genes nonfunctional, leading to their loss. This case of 
relaxed-selective pressure is due to changes within the organ- 
ism's genome (gene loss) and thus is a case of recurrent- 
genetic. Finally, it is also possible that some aspects of RGE 
simply reflect a strong tendency toward deletion at the 
genome level (mutational pressure). Such a deletion process 
could arise due to changes in the DNA replication/repair 
machinery (genetic) or due to changes in the environment 
(e.g., increased ultra violet exposure leading to a greater rate 
of double-strand breaks in DNA; environmental). Notably, it is 
also conceivable that the pressures governing recurrent RGE 
are general: Gene loss is known even in species without 
striking genome reduction, and many lineages appear to 
experience an excess of DNA deletions over insertion (Petrov 
2002a, 2002b). From this perspective, lineages undergoing 
RGE could potentially be exhibiting general pressures that 
have simply proceeded to a more advanced stage. 

Multiple Levels of Recurrent Genomic 
Evolution 

We next proceed to a discussion of different examples of 
observed genomic recurrence. We have organized these ex- 
amples by the "scale" of their changes: recurrent genomic 
evolution can be recognized at multiple scales, ranging from 
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whole-genome patterns such as RGE (globally affecting 
numerous individual features at the same time) to specific 
changes within individual genes (such as the recurrent de- 
letion of a regulatory DNA motif). Although these different 
levels are interconnected and, in many cases, are probably 
interdependent, for clarity, we will divide the examples 
discussed here into four broad categories. We will first 
review cases of genome-wide patterns of recurrent evolu- 
tion, subdivided into changes in genomic organization (such 
as RGE) and global changes of gene structures. Then, we will 
focus on cases affecting single genes or gene families. Last, 
we will zoom in to discuss examples of recurrent evolution 
of features within the individual genes themselves. 

Cases of Recurrent Evolution of 
Genomic Organization 

Expansive Genome Evolution 

Another repeatedly observed evolutionary trajectory is pro- 
nounced expansion of genome size and content. At least in 
animal, plant, and fungi, some species have dramatically in- 
creased total DNA content (Gregory et al. 2007). In some 
cases, gene numbers have increased several-fold relative 
to related lineages (often through WGDs [see below]), ac- 
companied by evolution of large gene families, apparently 
increased intergenic and intron lengths, and, in nearly all 
cases, massive proliferation of repetitive elements (e.g., 
Lander et al. 2001; Bennetzen 2002; Kidwell 2002; Piegu 
etal. 2006; Ungereretal. 2006; Gregory etal. 2007). Similar 
histories may have also been experienced by other lineages; 
however, systematic undersampling of large genomes 
outside of these three groups has hampered our knowledge 
of other such taxa. Here, again, the causes for convergent 
genome expansion remain unclear, although, given that 
massive genome expansions require hundreds of mutations 
accumulating in the same direction, they are unlikely to 
evolve simply by chance. Some hypotheses closely associate 
genome expansion with multicellularity. One possibility is 
that multicellularity promotes evolution of regulatory com- 
plexity and gene family expansion (Vogel and Chothia 2006; 
Taft et al. 2007; Lang et al. 2010). Another influential 
hypothesis suggests that genome expansion in multicellu- 
lar organisms largely reflects reduced selection against 
mildly deleterious insertions (such as gene duplicates, 
transposable element insertions, and introns) in species 
with reduced N e , such as plants or animals (Lynch and 
Conery 2003; Lynch 2007). However, recent work 
questioning the correlation between N e and genomic com- 
plexity urge caution (Whitney and Garland 2010, but see 
Lynch 2011). Finally, it is possible that genetic changes, 
such as high expression of active retrotranscriptases, can 
lead to increased proliferation of repeated elements, a 
recurrent-genetic mutational cause. 



Whole-Genome Duplications 

A polyploid is a cell, organism, or species that contains more 
than two homologous sets of chromosomes. The mutation 
that produce them is referred to as WGD or polyploidization, 
and it has been repeatedly described in many eukaryotic 
groups, including animals (Bisbee et al. 1977; Amores 
et al. 1998; Gallardo et al. 1999; Evans et al. 2004; Edger 
and Pires 2009), plants (Fawcett et al. 2009), ciliates (Aury 
et al. 2006), oomycetes (Martens and Van de Peer 2010), 
and fungi (Wolfe and Shields 1997; Ma et al. 2009). 
Although extensive gene losses in paleopolyploids could re- 
sult in a diploid-like gene complement, WGDs are generally 
not reversible and therefore are a case of mutational ratchet, 
a "general mutational" cause (see below). In some lineages, 
this phenomenon is especially pervasive, with a high prev- 
alence of multiple extra rounds of polyploidizations after 
a first WGD event (especially common in plants, but also 
several animal lineages) (Evans et al. 2004). However, it is 
not clear whether recurrent WGDs, although very frequent, 
occur and accumulate more often than expected for a 
random process. From a selectional perspective, although 
WGDs can have immediate phenotypic effects (Kennedy 
et al. 2006; Thompson and Merg 2008), these may not 
explain the fixation in most cases. However, Fawcett 
et al. (2009) have suggested that plant lineages that under- 
went WGDs had a better chance to survive after the KT 
mass extinction. In addition, WGDs have been postulated 
to have served as a frequent source of increased evolution- 
ary potential for subsequent evolution (Blomme et al. 
2006; Zhang and Cohn 2008), even though hypotheses 
linking WGDs with big taxonomic radiations and evolution- 
ary novelties have been controversial (Donoghue and 
Purnell 2005; Hurley et al. 2007). In total then, although 
WGD may result in dramatic recurrent patterns at a 
genome-wide level, it may not be caused by common evo- 
lutionary forces acting on a particular set of lineages but 
may simply respond to a high mutational frequency (i.e., 
a higher rate of mutations leading to polyploidization). 

Sex Chromosomes 

In many distantly related eukaryotes, sex is determined at 
the genetic level by chromosomal complement. This is 
thought to involve a cascade of events driven largely by 
sexual antagonistic selection, including 1) a gene at a 
previously autosomal locus develops a dominant ability to 
determine sex; 2) recombination is suppressed at this locus; 

3) additional sex-related genes accumulate nearby on the 
chromosome, further driving recombination suppression; 

4) stepwise degradation of the chromosome containing 
the dominant sex determinant (Y/W); and 5) increased traf- 
fic of genes between the sex chromosomes and autosomes. 
Evolution of similar sex chromosome systems has occurred 
repeatedly in vertebrates, invertebrates, fungi, and plants 
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(Fraser et al. 2004; Fraser and Heitman 2005; Bergero 
et al. 2007; Bellott et al. 2010; Charlesworth and Mank 
2010; Davis and Thomas 2010; Kaiser and Bachtrog 
2010; Ellegren 2011). Sex chromosomes are thus an 
example of a "selectional" cascade of events triggered 
by recurrent genetic changes. Finally, another interesting 
case of recurrent evolution of a genome-based sex deter- 
mination system is the X-autosome balance in at least 
Drosophila and Caenorhabditis (reviewed in Haag 2005) 
and the plant genus Rumex (Navajas-Perez et al. 2005). 

Changes in Global and Local Nucleotide Composition 

Global nucleotide composition (or GC content) ranges 
widely across eukaryotic and prokaryotic genomes. In par- 
ticular, many divergent lineages have recurrently evolved 
highly AT-rich genomes throughout eukaryotic evolution 
(Gardner et al. 2002; Eichinger et al. 2005; Eisen et al. 
2006; Ghedin et al. 2007), whereas the evolution of highly 
GC-rich genomes is rarer among eukaryotes (Merchant et al. 
2007). These differences are likely due to a combination of 
selectional and mutational pressures (including mutational 
bias and biased recombination-associated DNA repair) 
(Yampolsky and Stoltzfus 2001; Birdsell 2002). Interestingly, 
because genome-wide GC-content is a major determinant 
of global codon bias (Hershberg and Petrov 2009), indepen- 
dent evolution of similar GC-contents in two different species 
will usually result in recurrent evolution of similar preferential 
codon usages. 

The same pressures — especially local differences in re- 
combination (Duret 2006; Duret and Arndt 2008)— are 
likely to cause local differences in GC-content also within 
genomes (e.g., isochores). Notably, these regions are contin- 
uously evolving; for example, several mammalian lineages 
are undergoing a recurrent process of GC-rich isochore ero- 
sion, with a significant trend of G/C to A/T substitutions, 
whereas others are independently increasing their overall 
GC-content (Duret et al. 2002; Belle et al. 2004; Romiguier 
etal. 2010). Interestingly, in addition to repeated patterns of 
nucleotide composition at a genomic scale, these trends 
sometimes result in cases of striking recurrence of GC- 
content at specific genes (e.g., the gene RAG! in two 
marsupial species; Gruber et al. 2007). 

Cases of Genome-Wide Recurrent 
Evolution of Gene Structures 

Widespread Genome-Wide Intron Loss 

Whereas most studied eukaryotic species have plentiful spli- 
ceosomal introns (at least one per gene on average), several 
distantly related lineages contain far fewer (<0.1 per gene, 
Matsuzaki et al. 2004; Vanacova et al. 2005; Morrison et al. 
2007), apparently due to independent episodes of massive 
intron loss (Irimia and Roy 2008). Why should this be? 



Perhaps, the leading hypothesis is that massive intron reduc- 
tion reflects strong positive selection for intron loss in line- 
ages that are optimized for fast replication (Doolittle 1 978). 
This is a recurrent-environmental positive-selection model, 
since it invokes increased positive selection due to peculiar- 
ities of species' environments, related to RGE. On the other 
hand, massive reduction in intron number could reflect 
"runaway" mutation, for instance due to elevated rates 
of creation of intronless DNA copies of genes by widespread 
retroposition associated with retroelement invasion (Roy 
and Penny 2007). This is a recurrent-genetic mutational 
model, since it invokes increased mutation due to peculiar- 
ities of species' genomes (retroelement invasion). Finally, 
evidence for more gradual intron number reduction in many 
lineages suggests a general mutational pressure toward 
intron loss, potentially due to a near absence of intron gain 
in many lineages (Roy and Irimia 2009a). This hypothesis 
provides an example of a "ratchet-like" effect (Covello 
and Gray 1993; Doolittle 1998), in which transition in 
one direction (from intron presence to absence) occurs 
much more readily than the reverse (intron gain), leading 
to a strong directionality to evolution. Ratchets can be 
due to mutation, selection, or a complicated combination 
of the two and are a common phenomenon across recurrent 
evolution of genomic features (see below for further discus- 
sion on the role of ratchet processes on the evolution of 
genome complexity and the constructive neutral evolution 
[CNE]; Stoltzfus 1 999; Gray et al. 201 0; Doolittle et al. 201 1 ; 
Speijer 201 1). 

Transformation of Intron Structures after Massive Intron 
Loss 

In each case in which a eukaryotic lineage has experienced 
nearly complete intron loss, the few remaining introns ex- 
hibit modified splicing signals, with strengthened consensus 
sequences for core splicing motifs (5' splice site and branch 
point), and even highly constrained distance between the 
branch point and the 3' intron boundary (Irimia et al. 
2007, 2009; Irimia and Roy 2008; Schwartz et al. 2008). 
Such a tight association between two genomic transforma- 
tions — intron loss and intron sequence change — suggests 
that genetic changes associated with one lead to selective 
pressures driving the other: a case of recurrent genetic 
positive-selective pressures. However, although several mech- 
anistic hypotheses have been proposed (Irimia and Roy 2008; 
Irimia et al. 2009), a clear explanation is still lacking. 

Spliced Leader Trans-Splicing 

Spliced leader trans-splicing (SLTS) is a variation on the 
spliceosomal splicing mechanism that attaches short 
frans-encoded RNA "leader" sequences to the 5' end of 
transcripts of a generally well-defined subset of genes. SLTS 
systems exhibit a highly punctate phylogenetic distribution 
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across protists and animals (Lukes et al. 2009; Roy and Irimia 
2009b; Douris et al. 2010; fig. 1). Phylogenetic evidence 
suggests frequent evolution of SLTS from a non-SLTS ances- 
tor; by contrast, no case of loss of SLTS in any lineage is 
known (Roy and Irimia 2009b), although, with current data 
and methods for detecting SLTS, cases of secondary loss of 
SLTS are hard to prove. This suggests a model in which 
1) new SLTS systems arise at some rate over evolutionary 
time, likely by creation from spliced leader-like sequences 
from traditional spliceosomal RNAs by largely neutral muta- 
tions (Lukes et al. 2009) and 2) degradation of defunct 5' 
untranslated regions (UTRs) following the evolution of SLTS 
leads to a very low probability of loss of SLTS. Thus, SLTS may 
be another case of mutational ratchet in which transition 
from one state to another is common over evolutionary 
time, but the reverse is rare, therefore leading to recurrent 
evolution of the same feature. Interestingly, the cascade of 
events leading to the evolution of SLTS may result in in- 
creased molecular complexity, by enabling new molecular 
paths of gene expression. 

One instance of the increased molecular complexity asso- 
ciated with SLTS is the evolution of polycistronic transcripts, 
which is tightly associated with SLTS in diverse eukaryotic 
lineages (and is very rare in eukaryotes without SLTS). This 
difference likely reflects the fact that in eukaryotes, transla- 
tion of downstream open reading frames fORFs) is generally 
inefficient. As such, in eukaryotes that lack SLTS, polycis- 
tronic transcripts will be rare; however, SLTS upstream of 
ORFs can create monocistronic mature messenger RNAs 
from polycistronic transcripts, resolving this difficulty. 
Dynamics of operon creation and loss may also reflect 
a ratchet: Mutations affecting transcription termination of 
upstream genes and leading to long transcripts may allow 
effective expression of trans-spliced downstream genes 
from polycistronic messages; on the other hand, internal 
promoters in operons are likely to eventually degrade, inhib- 
iting the opposite transition, from operons back to indepen- 
dent promoters. In total, then, the evolution of SLTS (and 
operonic systems) are perhaps the best example of recurrent 
CNE (Lukes et al. 2009), an alternative mechanism to 
generate increased biological diversity (Stoltzfus 1999; Gray 
et al. 2010; Doolittle et al. 201 1; and see Speijer 201 1 for 
counterarguments). 

Massive Loss of U12 Introns 

U12 or minor introns are a rare class of introns that are re- 
moved by a distinct spliceosomal machinery and characterized 
by strict extended splice signals. U12 introns are likely to have 
been present in the last common ancestors of eukaryotes but 
have been independently reduced in number or completely 
lost in many lineages (Russell et al. 2005; Alioto 2007; Davila 
Lopez et al. 2008; Roy and Irimia 2009b). The dynamics may 
be governed by a general mutational ratchet (in this case, not 



associated to CNE): whereas both loss of U12-intron se- 
quences and conversion from U12- to "standard" major 
U2-spliceosomal introns are routinely observed, and simple 
mutations causing these changes have been identified in 
the laboratory, the opposite (U2-to-U12) has never been 
documented (Burge et al. 1998; Roy and Irimia 2009b). 

Case of Recurrent Genome Evolution 
at the Gene or Gene Family Level 

Gene Duplications and Family Expansions 

Gene duplication is a frequent phenomenon (Lipinski et al. 
201 1), which affects a wide variety of gene families and 
biological processes, suggesting much recurrent gene 
duplication may be largely stochastic. However, exceptions 
in which recurrent gene duplication has underpinned paral- 
lel phenotypic evolution are also known. One clear example 
involves duplication of RNAse genes (Zhang 2006). In two 
lineages of leaf-eating monkeys, a new digestive tract- 
specific RNAse gene arose by duplication of the same ances- 
tral RNAse and acquired identical amino acid changes alter- 
ing RNAse activity and resulting in improved leaf digestion. 
Such cases represent recurrent genomic evolution due to 
selective environmental pressures acting at on a specific 
subset of lineages. 

Other cases evidence general environmental adaptation 
by recurrent massive gene family expansion. Some biolog- 
ical functions, such as immunity, chemoreception, and de- 
toxification, require the interaction or the recognition of 
a vast range of substrates, and, thus, increased molecular 
diversity of paralogs within the genome could be favored. 
For instance, cytochrome-P450 genes, which participate 
in detoxification of various compounds, have undergone 
pronounced independent expansion in many metazoan lin- 
eages (Thomas 2007; Baldwin et al. 2009). A similar situa- 
tion is found in chordate olfactory receptors, where a 
correlation with environmental positive-selective pressures 
is evident (Niimura and Nei 2007; Niimura 2009). On the 
other hand, other cases of recurrent massive gene family 
expansion — which are overwhelmingly statistically signifi- 
cant over a random expectation obtained from related gene 
families — suggest important adaptation of unknown func- 
tional significance, raising important questions for further 
exploration (e.g., EXTK tyrosine kinases, for which dozens 
of members have independently evolved in several lineages; 
fig. 1 , in contrast to all other related tyrosine kinase families, 
for which nearly no gene duplications are known in other 
metazoans lineages, D'Aniello et al. 2008). 

Cluster Formation and Assembly of Syntenic Blocks 

Pairs or groups of genes may be closely physically linked in 
different species due to functional reasons. In most cases, 
this reflects retention of an ancestral association; however, 
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some instances of repeated evolution of physical linkage 
between pairs or groups of genes have been described. 
One set of these involves recurrent evolution of clusters 
of paralogous genes, presumably by tandem gene duplica- 
tion and selection against gene translocation. These geno- 
mic structures may provide a genetic positive-selective 
advantage by allowing subtle coding sequence and tran- 
scriptional diversification of new gene copies under the con- 
trol of a shared set of regulatory elements (Tena et al. 201 1). 
Accordingly, many described cases correspond to key devel- 
opmental genes with complex transcriptional expression 
patterns (Peterson 2004; Duncan et al. 2008; Irimia et al. 
2008; Kuraku et al. 2008; Takatori et al. 2008; Kerner 
et al. 2009; Negre and Simpson 2009); for example, Iroquois 
genes have independently evolved gene clusters in at least 
five metazoan lineages (Irimia et al. 2008; Takatori et al. 
2008; Kerner et al. 2009), arguing for positive-selective rea- 
sons versus stochastic occurrence. More rarely, recurrent 
linkage of nonparalogous genes may occur, and this asso- 
ciation may be favored due to functional advantages 
(e.g., improved coordination of expression): for instance, 
for three genes involved in galactose metabolism in two di- 
vergent fungal phyla (Slot and Rokas 2010). 

Disruption of Highly Conserved Gene Clusters and 
Other Syntenic Blocks 

Ancestral blocks of syntenic genes have been maintained in 
diverse modern animals, indicating strong selection for their 
retention in diverse lineages, generally associated with spe- 
cific developmental programs (e.g., Hox gene clusters; Du- 
boule 2007). However, these associations have been 
recurrently disrupted in several different animal lineages 
(Ferrier and Holland 2002; Seo et al. 2004; Pierce et al. 
2005; Duboule 2007; Negre and Ruiz 2007). This indicates 
that these linkages have repeatedly become nonessential, 
suggesting modification of fundamental animal develop- 
mental programs, a potential case of relaxed-selective 
pressures. Similarly, disruption of ancient associations of 
phylogenetically unrelated genes, acting as genomic regu- 
latory blocks (Engstrom et al. 2007; Kikuta et al. 2007), have 
also been reported (e.g., Iroquois genes with Sowah genes 
in several lineages; Irimia et al. 2008; Maeso et al. 2012). 

Gene Losses 

Gene losses constitute an obvious example of nonobstruc- 
tive mutational ratchet for rather unessential genes. In ex- 
treme examples, such as the GFP gene family in metazoans 
and the oxylipin pathway genes in holozoans, the taxonomic 
distribution implies at least five independent losses (Deheyn 
et al. 2007; Lee et al. 2008, fig. 1). Alternatively, the loss of 
the same selective pressure in two lineages due to a common 
change in lifestyle and/or developmental process (e.g., loss 
of vision in lightless environments; Protas et al. 201 1) may 



result in dispensability of the same genes and thus in their 
recurrent loss (environmental relaxed-selection). An exam- 
ple of this is the repeated loss of oxidative phosphorylation 
complex I genes in anaerobic fungi (Marcet-Houben et al. 
2009). In such cases, the loss of one of the genes involved 
in a particular protein complex or biological pathway could 
render its interacting partners nonfunctional, further 
enhancing the loss of the latter. This is exemplified by the 
absence of all six proteins integrating the fifth adaptor 
protein (AP-5) complex independently in five different 
eukaryotic lineages (Hirst et al. 201 1). 

Genie redundancy, by individual gene duplication or 
WGD, configures yet another evolutionary scenario for re- 
current gene losses (genetic relaxed selection). In these 
cases, although simple chance is likely to underlie most pat- 
terns of gene loss, there are instances in which not all genes 
seem to be equally prone to retention. For example, some 
paralogs have been repeatedly lost specifically in different 
vertebrate lineages, as is the case of Pdx2 genes in teleosts 
and tetrapods (Mulley and Holland 2010), EvxB in elephant 
shark and tetrapods (Ravi et al. 2009), Alx3 in frogs, lizards, 
and chicken (McGonnell et al. 201 1), or globin-E gene (GbE) 
in all major vertebrate lineages but birds (Hoffmann et al. 
201 1). (It should be noted, however, that although intrigu- 
ing and suggestive, these patterns of coincidental loss across 
four/five major vertebrate lineages cannot be statistically 
significantly different from the null expectation due to 
the small sample size. Further availability of genomic se- 
quences should overcome this limitation.) More globally, this 
nonramdom pattern of paralog losses seems to be the rule 
in yeast (Scannell et al. 2007). Finally, some recurrent losses 
may reflect positive-selective genetic pressure: for instance, 
recurrent reduction to a single copy of the same gene 
families following WGD in plants, fungi, and animals likely 
reflects strong purifying selection on gene dosage (Paterson 
et al. 2006). 

Cases of Recurrent Evolution of 
Specific Intragenic Features 

Tandem Exon Duplications 

Seven to 17% of metazoan genes have tandem exon dupli- 
cations (Letunic et al. 2002; Gao and Lynch 2009), generally 
associated with mutually exclusive alternative splicing 
(Kondrashov and Koonin 2001; Irimia et al. 2008). This 
alternative processing generates internal redundancy (inter- 
nal paralogy), which can be exploited to produce function- 
ally divergent transcripts. Although many exon duplications 
may be (nearly) neutral and occurring by chance, extreme 
recurrent cases suggest positive-selective forces. A classic 
example is the DSCAM gene, in which exons 6 and 9 have 
undergone massive, independent expansions in different 
insect and crustacean lineages (Brites et al. 2008; Lee 
et al. 2010). Alternative splicing generates many isoforms 
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of the DSCAM gene, which encodes receptors involved in 
axon guidance, potentially allowing for increased wiring com- 
plexity (Schmucker et al. 2000). In the tropomyosin cytoskel- 
etal gene, independent duplication of many different exons 
has occurred in most bilaterian lineages (Vrhovski et al. 2008; 
Irimia, Maeso, et al. 2010; Koziol et al. 201 1; fig. 1) at a fre- 
quency statistically significantly higher than expected even 
from the highest estimates of intragenic duplications (Gao 
and Lynch 2009). The explanation appears to lie in the use 
of alternative promoters to produce two different protein iso- 
formswith radically different cellular functions. Following du- 
plication, each exon copy is "assigned" to one of the two 
isoforms, reducing pleiotropy and allowing "general positive 
selection" for optimized function of each protein (Irimia, 
Maeso, etal. 2010). Finally, another classic example is the par- 
allel evolution of alternative splicing of recurrent tandem 
exon duplicates in ion channel receptors in flies and mammals 
(Copley 2004; Fodor and Aldrich 2009). 

Gain or Loss of Individual Introns 

Intron loss is a relatively common process, especially in some 
lineages, so the loss of the same intron in a specific gene is 
likely to occur repeatedly in different lineages simply by 
chance (Roy and Penny 2006; Roy and Irimia 2008a). How- 
ever, certain gene features, such as conserved high expres- 
sion level (Carmel and Koonin 2009), could generate trends 
toward recurrent intron loss from some genes (a case of 
general positive selection). Intron gain, on the other hand, 
is generally thought to be less common, although the extent 
of parallel gains have been widely debated (e.g., Csuros 
2005; Nguyen et al. 2005; Sverdlov et al. 2005), and ge- 
nome-wide comparisons showed that they may account 
for up to 8% of the shared intron positions across eukaryotic 
genes (Carmel et al. 2007). In addition, clear individual cases 
have been identified (Tarrio et al. 2003; Qiu et al. 2004; Ah- 
madinejad et al. 201 0), even as polymorphisms within pop- 
ulations (Omilian et al. 2008; Li et al. 2009). Nonetheless, 
despite its lower frequency, parallel intron gain is also likely 
to occur largely by chance, particularly given than no case of 
parallel gain in multiple lineages has been described yet. Al- 
ternatively, intron gain has long been proposed to be biased 
toward certain sequences (proto-splice sites; Dibb and New- 
man 1989), which could impose a general mutational pres- 
sure underlying the recurrent patterns. 

Recurrent Loss of Gene Parts 

Repeated loss of coding sequences of genes may provide 
parallel changes in protein function or protein-protein inter- 
actions (e.g., truncation of C-terminal transactivation 
domain in meis/hth proteins, Irimia et al. 2011; and loss 
of Snag domains in C 2 H 2 zinc fingers, Barrallo-Gimeno 
and Nieto 2009; Irimia et al. 2010). At the regulatory level, 
recurrent loss of as-regulatory sequences can have major 



phenotypic and adaptative consequences with minimal 
pleiotropic effects (e.g., repeated deletion of a pelvic en- 
hancer in stickleback populations; Chan et al. 2010). In 
other cases, change in body plans and/or developmental 
programs may render some regulatory elements unnecessary, 
even for otherwise deeply conserved sequences (e.g., the 
only known regulatory element conserved from cnidarians 
to vertebrates has been lost (or diverged beyond recognition) 
independently in protostomes, tunicates, and hydra; Royo 
et al. 201 1). Thus, a great variety of causes can be devised 
for this type of genomic changes, depending on the gene 
and lineages involved (recurrent-environmental positive-selec- 
tion, recurrent-environmental and recurrent-genetic relaxed-se- 
lection, general mutation, etc.). 

Evolution of Coding Sequences 

Cases of identical changes in amino acid sequences in dif- 
ferent lineages have been extensively studied and represent 
the paradigmatic example of recurrent molecular pheno- 
typic evolution (Doolittle 1994; Zhang and Kumar 1997; 
Christin et al. 2010). Parallel amino acid replacements are 
probably very frequent and happen extensively by chance 
even at generally highly conserved sites (i.e., "rare amino 
acid replacements," RGC_CAMs; Irimia etal. 2007; Rogozin 
et al. 2007a, 2007b, 2008; Roy and Irimia 2008b). However, 
it has been estimated that homoplastic amino acid substitu- 
tions are 2-fold more common than expected under neutral 
models of protein evolution (Rokas and Carroll 2008). Not 
surprisingly, then, in addition to the plethora of neutral 
cases, many studied examples are linked to recurrent envi- 
ronmental positive-selective pressures, with amino acid sub- 
stitutions conferring adaptative changes to the new 
environment (e.g., optimal activity at lower pH conditions 
in the aforementioned RNAses, Zhang 2006; or changes 
in "hearing genes" in mammals with echolocating systems; 
Liu et al. 2010; Davies et al. 201 1). 

The Relationship between Recurrent 
Genome Evolution and Phenotype 

What are the phenotypic effects of this wealth of recurrent 
genomic changes? It is worth noting that, with regard to the 
genotype-phenotype map, the study of recurrent genomic 
changes may be seen as the inverse of the study of recurrent 
phenotypic changes. The study of recurrent phenotypic 
evolution is an inherently "top-down" enterprise (fig. 2): 
study begins with the observation of similar morphological, 
physiological, or even molecular phenotypes and then inves- 
tigates whether or not the underlying genetic changes also 
share similarities (redeployment of the same key develop- 
mental genes or similar types of mutations). Recurrent 
phenotypes may or may not reflect changes in the same 
pathways, the same genes within those pathways, the same 
types of changes within those genes (e.g., exon duplication 
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vs. protein changes), the same specific change (e.g., a spe- 
cific amino acid change), or the same genome-level change 
giving rise to the transcript/protein change (e.g., Threonine- 
to-Serine changes can occur due to substitutions at the first 
or third codon position). Even if the transcript changes are 
the same, this could reflect identical or nonidentical changes 
in the genome (e.g., genomic change vs. RNA editing). In all 
cases, the organismal phenotypes are equivalent, regardless 
of the similarity or difference of their genomic bases. 

By contrast, study of recurrent genomic evolution is a 
fundamentally bottom-up pursuit (fig. 2): study begins with 
an observation of similarity encoded at the genomic level 
(e.g., independently duplicated exons in tropomyosin genes) 
and then investigates whether or not these similarities are 
reflected in resemblance at phenotypic levels (optimization 
of the same two protein functions). For instance, consider 
a recurrent intragenic tandem duplication. The duplications 
may affect the transcriptome or may not (e.g., an intronic 
duplication may not). Exonic duplications may affect the 
protein sequence/function/structure or may not (e.g., an 
exon in a UTR). Protein-affecting changes may or may 
not affect cellular/organismal phenotype. Fundamentally, 
then, whereas repeated phenotypic evolution may speak di- 
rectly of adaptative values, but only rarely (and sometimes 
indirectly) about the evolutionary mechanisms of genetic 
change, recurrent genomic evolution directly informs about 
the genetic changes themselves, although adaptative causes 
can remain more elusive. The types and extents of phenotypic 
changes due to recurrent genomic changes — and the similar- 
ities of these changes across lineages — remain largely un- 
known and represent an important set of questions in 
understanding recurrent evolution. 

What Do Recurrent Genomic Features 
Then Tell Us about Evolution? 

Genomic recurrence provides a new perspective on evo- 
lutionary processes, informing us in often unexpected 
ways about commonalities of forces — mutational and/or 
selectional — acting across different lineages. Cases of 
genomic recurrence caused by ratchet mutations are funda- 
mental to understanding the evolutionary constraints and 
canalizations that shape the way in which the "genome- 
space," as the morphospace, is explored through evolution, 
underscoring predictability in the overall outcome of neutral 
mutation, whether or not this will be "constructive" (Stoltzfus 
1 999; Gray et al. 201 0; Doolittle et al. 201 1 ; Speijer 201 1 ). For 
example, the observation of recurrent emergence of SETS sug- 
gests that the mutational path to a new SETS system is readily 
available over long evolutionary times; on the other hand, the 
lack of reversion from SETS to non-SLTS presumably indicates 
general selective forces opposing loss of SETS, for instance due 
to loss of the machinery involved in the non-SLTS-dependent 
expression of the genes subject to SETS. 



Other quasineutral changes that have been repeatedly 
used as substrate for molecular innovations suggest that cer- 
tain genomic traits confer evolutionary flexibility, opening 
new venues that can be explored during evolution. Thus, 
their mere presence would be indicative of evolutionary po- 
tential, allowing specific hypotheses about the occurrence 
of typically accompanying features (e.g., reorganization 
of conserved synteny after WGDs or the creation of operons 
in the presence of SLTS). 

In other cases, although cellular/organismal phenotypic 
consequences of genomic recurrence may not be immedi- 
ately evident, careful study of genomic patterns can provide 
straightforward testable hypotheses about phenotypic 
consequences. For instance, the observation of recurrent 
evolution of gastrointestinal RNAase paralogs in two leaf- 
eating monkey lineages made specific predictions that pro- 
tein sequence changes in the gastrointestinal RNAase gene 
would enhance digestion, which were later experimentally 
confirmed (Zhang 2006). 

However, it is in the less predictable cases in which the 
study of recurrent genome evolution arguably reaches 
the height of its power. For instance, the finding that splicing 
motifs become highly similar among the remaining introns 
in nearly intronless species came as a profound surprise 
(Irimia et al. 2007; Irimia and Roy 2008; Schwartz et al. 
2008). This pattern indicates a rule that is at the same time 
extremely clear and poorly understood: In the context of 
(or following) nearly complete intron loss, selection for con- 
sensus sequences increases on remaining introns. In such 
cases, the repeatability of the evolutionary outcomes is likely 
to point at specific ways in how selection acts on these 
features, illuminating the path for future research. 

Concluding Remarks 

The diverse instances discussed here represent only a subset 
of the known cases of repeated evolution at the genome 
level that have been found largely serendipitously, suggest- 
ing that recurrent patterns of genome evolution are wide- 
spread. In addition, although recurrent evolution can occur 
by sheer chance, the above examples provide extensive 
evidence that genomic recurrence often respond to specific 
evolutionary forces. 

As ancestrally shared features are the result of a common 
evolutionary history, shared features evolved by recurrent 
evolution are often the result of common evolutionary 
forces acting on different lineages. These cases improve 
our understanding of genome evolution, the causes and 
the modes, allowing us to make specific predictions about 
evolutionary outcomes. Unraveling the manifold signifi- 
cance of repeated genomic outputs will necessarily require 
comprehensive and systematic analyses of recurrent phe- 
nomena as well as rigorous statistical testing and greater 
phylogenetic sampling to assess the dynamics underlying 
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observed cases of convergence. Given the increasing 
availability of complete genome sequences, these analyses 
are increasingly possible, and as with replicates in experi- 
mental research, recurrent events will help us to sketch 
an increasingly focused picture of genome evolution. 
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